In [4]:
import geopandas as gpd
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from matplotlib_scalebar.scalebar import ScaleBar

import osmnx as ox

# Set display option to show all columns
pd.set_option('display.max_columns', None)

Lab 1: Geographic Data and Spatial Models¶

Welcome to today's lab session!

In this session, we will delve into the functionalities of two powerful Python libraries: Geopandas and GurobiPy.

  1. Geopandas:

Geopandas is a Python library that facilitates working with geospatial data. It extends the capabilities of Pandas, a widely used data manipulation library, to support geographic data types and operations. Throughout this lab, we will learn how to read, write, visualize, and analyze geographic data using Geopandas.

  1. GurobiPy:

GurobiPy, or Gurobi Optimization Python Interface, is a Python package that provides access to the Gurobi Optimizer, a powerful mathematical optimization solver. Gurobi is widely used in various industries for solving linear programming, integer programming, quadratic programming, and other optimization problems. In this lab, we will explore the basics of using GurobiPy to formulate and solve optimization problems.

  1. Knapsack Problem:

As part of our exploration, we will delve into a classic optimization problem known as the Knapsack Problem. This problem involves selecting a combination of items to maximize value while staying within a given weight constraint. We will apply the concepts learned from Geopandas and GurobiPy to solve practical scenarios, such as selecting parcels to buy under various conditions, resembling real-world decision-making scenarios.

Throughout the lab, we encourage you to actively engage in hands-on exercises, experiment with code snippets, and explore additional functionalities of Geopandas and GurobiPy. Feel free to ask questions, collaborate with your peers, and enjoy the journey of discovering the capabilities of these powerful Python libraries.

Lab Structure¶

Lab 1 is due in two weeks and consists of two chapters.
Chapter 1 is to figure out which parcel you'd like to buy within given budget.
Chpater 2 is to figure out the distribution of the distance between wildfire ignitions and major roads.

Chapter 1. Which parcel do you want to buy?¶

Chapter 1 performs an exploratory data analysis on Parcel Data through Geopandas.

Then, utilizing given information, a Knapsack problem will be solved using GurobiPy solver.
You'll get to know which parcels you'd like to buy with given budget.

Step 1. Explore Santa Barbara Parcel Data¶

In [5]:
# Read parcel data in
parcels = gpd.read_file("Lab1Data/parcel_SB.shp")

# Plot your parcel data
parcels.plot()
Out[5]:
<Axes: >
No description has been provided for this image
In [6]:
parcels
Out[6]:
OBJECTID APN LAYER Situs1 Situs2 Acreage LandUse UseCode PctTransf LandValue StrImpr LivImpr NetSecVal Net_Impr YearBuilt Bedrooms Bathrooms geometry
0 131101 041-350-024 Ground 2315 EDGEWATER WAY SANTA BARBARA, CA 93109 0.52 SINGLE FAMILY RESIDENCE 0100 100.0 288614 331840 0 620454 331840 1967 4 3.00 POLYGON ((6039997.884 1971525.657, 6039982.106...
1 131102 041-350-012 Ground 2307 EDGEWATER WAY SANTA BARBARA, CA 93109 0.18 SINGLE FAMILY RESIDENCE 0100 26.6 550902 1085141 0 1636043 1085141 2006 3 3.50 POLYGON ((6040044.305 1971477.086, 6040032.277...
2 131103 041-350-015 Ground 2211 EDGEWATER WAY SANTA BARBARA, CA 93109 0.59 SINGLE FAMILY RESIDENCE 0100 100.0 1641000 175000 0 1816000 175000 1954 2 2.50 POLYGON ((6040237.820 1971288.705, 6040179.849...
3 131104 041-350-016 Ground 2201 EDGEWATER WAY SANTA BARBARA, CA 93109 0.50 SINGLE FAMILY RESIDENCE 0100 100.0 1644611 1482404 0 3127015 1482404 1948 3 3.00 POLYGON ((6040320.008 1971271.353, 6040268.735...
4 131105 041-350-017 Ground 2111 EDGEWATER WAY SANTA BARBARA, CA 93109 0.76 SINGLE FAMILY RESIDENCE 0100 0.0 63587 156682 0 213269 156682 1950 4 3.00 POLYGON ((6040450.139 1971243.880, 6040398.978...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1867 250105 045-290-008 Condo Third Floor 30 BARRANCA AVE 3 SANTA BARBARA, CA 93109 0.00 CONDOS,COMMUNITY APT PROJS 0323 100.0 46816 70592 0 117408 70592 1973 2 1.75 POLYGON ((6048328.471 1972754.300, 6048325.112...
1868 250106 045-290-009 Condo Third Floor 36 BARRANCA AVE 6 SANTA BARBARA, CA 93109 0.00 CONDOS,COMMUNITY APT PROJS 0323 100.0 750000 668000 0 1418000 668000 1973 2 1.75 POLYGON ((6048287.470 1972741.338, 6048291.580...
1869 250107 045-290-010 Condo Third Floor 36 BARRANCA AVE 5 SANTA BARBARA, CA 93109 0.00 CONDOS,COMMUNITY APT PROJS 0323 100.0 309553 206366 0 515919 206366 1973 2 2.00 POLYGON ((6048283.551 1972700.429, 6048271.984...
1870 250108 045-290-011 Condo Third Floor 40 BARRANCA AVE 3 SANTA BARBARA, CA 93109 0.00 CONDOS,COMMUNITY APT PROJS 0323 100.0 124884 187149 0 312033 187149 1973 2 1.75 POLYGON ((6048234.146 1972677.397, 6048238.386...
1871 250109 045-330-008 Condo Third Floor 101 OCEANO AVE 5 SANTA BARBARA, CA 93109 0.00 CONDOS,COMMUNITY APT PROJS 0332 100.0 305105 298615 0 603720 298615 1961 2 1.75 POLYGON ((6048280.057 1972978.876, 6048296.690...

1872 rows × 18 columns

In [7]:
# Check the data types of your parcel data
parcels.dtypes
Out[7]:
OBJECTID        int64
APN            object
LAYER          object
Situs1         object
Situs2         object
Acreage       float64
LandUse        object
UseCode        object
PctTransf     float64
LandValue       int64
StrImpr         int64
LivImpr         int64
NetSecVal       int64
Net_Impr        int64
YearBuilt      object
Bedrooms        int64
Bathrooms     float64
geometry     geometry
dtype: object
In [8]:
# Check the Coordiante System of your parcel data
parcels.crs
Out[8]:
<Projected CRS: EPSG:2229>
Name: NAD83 / California zone 5 (ftUS)
Axis Info [cartesian]:
- X[east]: Easting (US survey foot)
- Y[north]: Northing (US survey foot)
Area of Use:
- name: United States (USA) - California - counties Kern; Los Angeles; San Bernardino; San Luis Obispo; Santa Barbara; Ventura.
- bounds: (-121.42, 32.76, -114.12, 35.81)
Coordinate Operation:
- name: SPCS83 California zone 5 (US Survey feet)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
In [9]:
# Plot your parcel data with the column "LandValue"
ax = parcels.plot("LandValue", legend=True, figsize=(15,5))
# Add a scale bar
plt.title("Parcel LandValue")
scalebar = ScaleBar(1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
Out[9]:
<matplotlib_scalebar.scalebar.ScaleBar at 0x143f89b10>
No description has been provided for this image
In [10]:
# Plot with a column "NetSecVal", Net Secured Value for each parcel.
ax = parcels.plot("NetSecVal", legend=True, figsize=(15,5))
# Add a scale bar
plt.title("Parcel Net Secured Value")
scalebar = ScaleBar(1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
Out[10]:
<matplotlib_scalebar.scalebar.ScaleBar at 0x143f4c850>
No description has been provided for this image
In [11]:
# Visualize another column, "YearBuilt" and check out a problem
ax = parcels.plot("YearBuilt", legend=True, figsize=(15,5))
# Add a scale bar
plt.title("Parcel YearBuilt")
scalebar = ScaleBar(1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
Out[11]:
<matplotlib_scalebar.scalebar.ScaleBar at 0x14400c590>
No description has been provided for this image
In [12]:
# Yes, there are two problems.
    # 1. Legend is too long. 
    # 2. Some parcels are disappeared.

# Therefore, let's see what's happening in the YearBuilt column.
# Firstly, let's print out the raw values.

parcels["YearBuilt"]
Out[12]:
0       1967
1       2006
2       1954
3       1948
4       1950
        ... 
1867    1973
1868    1973
1869    1973
1870    1973
1871    1961
Name: YearBuilt, Length: 1872, dtype: object
In [13]:
# I see. They are in string type, not in numeric type. 
    # That's why the legend was categorical, not a continuous colorbar.

# So we are going to replace the YearBuilt columns, with the same values in float data type.
# You can convert the data type with the method, .astype()
parcels["YearBuilt"]=parcels["YearBuilt"].astype("float")
In [14]:
parcels["YearBuilt"]
Out[14]:
0       1967.0
1       2006.0
2       1954.0
3       1948.0
4       1950.0
         ...  
1867    1973.0
1868    1973.0
1869    1973.0
1870    1973.0
1871    1961.0
Name: YearBuilt, Length: 1872, dtype: float64
In [15]:
# Also, as there were some missing parcels, let's check if there's any Null values in YearBuilt column.
parcels["YearBuilt"].isnull().sum()
Out[15]:
135
In [16]:
# Oh.. there were 135 null values, that's why there are some blanks. 135 parcels out of 1872 are true. 
    # This is a data limitation or there must be some reason of the null values. 
    # So let's accept the fact that there exist null values.

# Now, plot the parcels GeoDataFrame with the "YearBuilt" column.
# Is there still something non-plausible..?
ax = parcels.plot("YearBuilt", legend=True, figsize=(15,5))
# Add a scale bar
plt.title("Parcel YearBuilt")
scalebar = ScaleBar(1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
Out[16]:
<matplotlib_scalebar.scalebar.ScaleBar at 0x143ed0110>
No description has been provided for this image
In [17]:
# How can there be some buildings built so long ago..? Year 0..? 2024 years ago?!
# Let's check out the min and max values of the YearBuilt column values.
parcels["YearBuilt"].min(), parcels["YearBuilt"].max()
Out[17]:
(0.0, 2015.0)
In [18]:
# Certainly, it must mean that the parcels with the year value 0 are actually Null values. 
# So let's replace them with Null, so the plot will show better spatial variation.
parcels.loc[parcels["YearBuilt"]==0, "YearBuilt"] = None
In [19]:
parcels.loc[parcels["YearBuilt"]==0]
Out[19]:
OBJECTID APN LAYER Situs1 Situs2 Acreage LandUse UseCode PctTransf LandValue StrImpr LivImpr NetSecVal Net_Impr YearBuilt Bedrooms Bathrooms geometry
In [20]:
### Your Turn!! Let's visualize the YearBuilt column with ScaleBar, once again. Does it look plausible?
ax = parcels.plot("YearBuilt", legend=True, figsize=(15,5))
# Add a scale bar
plt.title("Parcel YearBuilt")
scalebar = ScaleBar(1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
Out[20]:
<matplotlib_scalebar.scalebar.ScaleBar at 0x143faaa50>
No description has been provided for this image

Step 2. Subset the Residential Parcels for Optimization¶

In [21]:
# Let's check what the dataset has in "LandUse" column using the .value_counts() method.
    # It returns the unique values and frequency.
parcels["LandUse"].value_counts()
Out[21]:
LandUse
SINGLE FAMILY RESIDENCE              1560
CONDOS,COMMUNITY APT PROJS            182
RESIDENTIAL INCOME, 2-4 UNITS          42
APARTMENTS, 5 OR MORE UNITS            29
VACANT                                 25
PARKS                                   5
OFFICE BUILDINGS, MULTI-STORY           5
PARKING LOTS                            3
SUPERMARKETS                            2
RESTAURANTS,BARS                        1
SERVICE STATIONS                        1
RETAIL STORES, SINGLE STORY             1
CHURCHES, RECTORY                       1
STORE AND OFFICE COMBINATION            1
MIXED USE-COMMERCIAL/RESIDENTIAL        1
PROFESSIONAL BUILDINGS                  1
OFFICE BUILDINGS, SINGLE STORY          1
COMMERCIAL AND OFFICE CONDOS,PUDS       1
Name: count, dtype: int64
In [67]:
parcels
Out[67]:
OBJECTID APN LAYER Situs1 Situs2 Acreage LandUse UseCode PctTransf LandValue StrImpr LivImpr NetSecVal Net_Impr YearBuilt Bedrooms Bathrooms geometry
0 130716 075-010-022 Ground 6875 EL COLEGIO RD GOLETA, CA 93117 9.60 SCHOOLS 7200 0.0 0 0 0 0 0 None 0 0.0 POLYGON ((5997851.219 1979476.854, 5998251.155...
1 130717 075-010-039 Ground None None 0.83 VACANT 0090 100.0 0 0 0 0 0 None 0 0.0 POLYGON ((5998411.116 1979466.012, 5998407.771...
2 130718 075-010-040 Ground None None 0.35 VACANT 0090 100.0 0 0 0 0 0 None 0 0.0 POLYGON ((5998631.052 1979460.936, 5998407.771...
3 130719 075-010-037 Ground None None 21.60 VACANT 0090 100.0 0 0 0 0 0 None 0 0.0 POLYGON ((5998749.878 1979458.194, 5998727.559...
4 130720 075-010-028 Ground 6795 EL COLEGIO RD GOLETA, CA 93117 11.78 VACANT 0090 0.0 0 0 0 0 0 None 0 0.0 POLYGON ((5999395.829 1979443.282, 5999388.511...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
912 244502 075-230-001 Condo First Floor 6636 DEL PLAYA DR # 1 GOLETA, CA 93117 0.03 CONDOS,COMMUNITY APT PROJS 0300 100.0 308718 308718 0 617436 308718 2003 1 1.0 POLYGON ((6000659.415 1976779.651, 6000658.790...
913 244503 075-230-003 Condo First Floor 6636 DEL PLAYA DR # 3 GOLETA, CA 93117 0.03 CONDOS,COMMUNITY APT PROJS 0300 100.0 308718 308718 0 617436 308718 2003 1 1.0 POLYGON ((6000659.596 1976790.249, 6000659.432...
914 248854 075-230-002 Condo Second Floor 6636 DEL PLAYA DR # 2 GOLETA, CA 93117 0.03 CONDOS,COMMUNITY APT PROJS 0300 100.0 308718 308718 0 617436 308718 2003 1 2.0 POLYGON ((6000646.610 1976774.894, 6000646.553...
915 248855 075-230-004 Condo Second Floor 6636 DEL PLAYA DR # 4 GOLETA, CA 93117 0.03 CONDOS,COMMUNITY APT PROJS 0300 100.0 308718 308718 0 617436 308718 2003 1 2.0 POLYGON ((6000659.898 1976810.348, 6000659.596...
916 239415 075-020-039 Right of Way (Private) None None 0.52 HIGHWAYS AND STREETS 8800 100.0 154 0 0 154 0 None 0 0.0 POLYGON ((6000356.910 1978984.122, 6000353.435...

917 rows × 18 columns

In [22]:
# Let's take a subset of the original parcels data, sothat only residential parcels can be included.
res_parcels = parcels.loc[parcels["LandUse"].isin(["SINGLE FAMILY RESIDENCE", 
                                                   "CONDOS,COMMUNITY APT PROJS",
                                                   "RESIDENTIAL INCOME, 2-4 UNITS",
                                                   "APARTMENTS, 5 OR MORE UNITS"])]
res_parcels["LandUse"].value_counts()
# In .loc[], the first argument selects rows based on a condition, and the optional second argument specifies columns.
# The .isin() function generates a binary series indicating whether each value is present in the given array.
Out[22]:
LandUse
SINGLE FAMILY RESIDENCE          1560
CONDOS,COMMUNITY APT PROJS        182
RESIDENTIAL INCOME, 2-4 UNITS      42
APARTMENTS, 5 OR MORE UNITS        29
Name: count, dtype: int64
In [23]:
# Also, let's say that we want the parcels where YearBuilt is identified.
# Check out how many parcels don't have YearBuilt identified.
res_parcels["YearBuilt"].isnull().sum()
Out[23]:
77
In [24]:
res_parcels = res_parcels.reset_index(drop=True)
In [25]:
# Then, let's drop them. 77 parcels don't have YearBuilt values.
res_parcels.loc[res_parcels["YearBuilt"].isnull()].index
res_parcels = res_parcels.drop(res_parcels.loc[res_parcels["YearBuilt"].isnull()].index).reset_index(drop=True)
In [26]:
# Check the output GeoDataFrame, anything non-plausible?
len(res_parcels)
Out[26]:
1736
In [27]:
# How would you fix this? 
#There is one parcel with zero bedrooms. Want to create a column that shows whether parcel has zero bedrooms or not. 
res_parcels["Bedroom_"] = res_parcels["Bedrooms"] > 0
res_parcels
Out[27]:
OBJECTID APN LAYER Situs1 Situs2 Acreage LandUse UseCode PctTransf LandValue StrImpr LivImpr NetSecVal Net_Impr YearBuilt Bedrooms Bathrooms geometry Bedroom_
0 131101 041-350-024 Ground 2315 EDGEWATER WAY SANTA BARBARA, CA 93109 0.52 SINGLE FAMILY RESIDENCE 0100 100.0 288614 331840 0 620454 331840 1967.0 4 3.00 POLYGON ((6039997.884 1971525.657, 6039982.106... True
1 131102 041-350-012 Ground 2307 EDGEWATER WAY SANTA BARBARA, CA 93109 0.18 SINGLE FAMILY RESIDENCE 0100 26.6 550902 1085141 0 1636043 1085141 2006.0 3 3.50 POLYGON ((6040044.305 1971477.086, 6040032.277... True
2 131103 041-350-015 Ground 2211 EDGEWATER WAY SANTA BARBARA, CA 93109 0.59 SINGLE FAMILY RESIDENCE 0100 100.0 1641000 175000 0 1816000 175000 1954.0 2 2.50 POLYGON ((6040237.820 1971288.705, 6040179.849... True
3 131104 041-350-016 Ground 2201 EDGEWATER WAY SANTA BARBARA, CA 93109 0.50 SINGLE FAMILY RESIDENCE 0100 100.0 1644611 1482404 0 3127015 1482404 1948.0 3 3.00 POLYGON ((6040320.008 1971271.353, 6040268.735... True
4 131105 041-350-017 Ground 2111 EDGEWATER WAY SANTA BARBARA, CA 93109 0.76 SINGLE FAMILY RESIDENCE 0100 0.0 63587 156682 0 213269 156682 1950.0 4 3.00 POLYGON ((6040450.139 1971243.880, 6040398.978... True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1731 250105 045-290-008 Condo Third Floor 30 BARRANCA AVE 3 SANTA BARBARA, CA 93109 0.00 CONDOS,COMMUNITY APT PROJS 0323 100.0 46816 70592 0 117408 70592 1973.0 2 1.75 POLYGON ((6048328.471 1972754.300, 6048325.112... True
1732 250106 045-290-009 Condo Third Floor 36 BARRANCA AVE 6 SANTA BARBARA, CA 93109 0.00 CONDOS,COMMUNITY APT PROJS 0323 100.0 750000 668000 0 1418000 668000 1973.0 2 1.75 POLYGON ((6048287.470 1972741.338, 6048291.580... True
1733 250107 045-290-010 Condo Third Floor 36 BARRANCA AVE 5 SANTA BARBARA, CA 93109 0.00 CONDOS,COMMUNITY APT PROJS 0323 100.0 309553 206366 0 515919 206366 1973.0 2 2.00 POLYGON ((6048283.551 1972700.429, 6048271.984... True
1734 250108 045-290-011 Condo Third Floor 40 BARRANCA AVE 3 SANTA BARBARA, CA 93109 0.00 CONDOS,COMMUNITY APT PROJS 0323 100.0 124884 187149 0 312033 187149 1973.0 2 1.75 POLYGON ((6048234.146 1972677.397, 6048238.386... True
1735 250109 045-330-008 Condo Third Floor 101 OCEANO AVE 5 SANTA BARBARA, CA 93109 0.00 CONDOS,COMMUNITY APT PROJS 0332 100.0 305105 298615 0 603720 298615 1961.0 2 1.75 POLYGON ((6048280.057 1972978.876, 6048296.690... True

1736 rows × 19 columns

In [28]:
# Then, with the residential parcels having YearBuilt identified,
# let's check the spatial distribution of the number of Bedrooms. Plot with a column "Bedrooms"
res_parcels.plot("Bedroom_", legend = True)
Out[28]:
<Axes: >
No description has been provided for this image
In [29]:
# Also, let's check the spatial distribution of the number of Bathrooms.
res_parcels.plot("Bathrooms", figsize=(15,5), legend=True, categorical=True, cmap="YlOrRd")
plt.axis("off")
plt.title("The number of Bathrooms")
Out[29]:
Text(0.5, 1.0, 'The number of Bathrooms')
No description has been provided for this image
In [ ]:
 
In [30]:
# Let's find the parcels, which doesn't have any Bedroom.
res_parcels.loc[res_parcels["Bedroom_"] == False]
Out[30]:
OBJECTID APN LAYER Situs1 Situs2 Acreage LandUse UseCode PctTransf LandValue StrImpr LivImpr NetSecVal Net_Impr YearBuilt Bedrooms Bathrooms geometry Bedroom_
1139 155252 045-185-006 Ground 1419 SHORELINE DR SANTA BARBARA, CA 93109 0.32 SINGLE FAMILY RESIDENCE 0100 100.0 77851 128819 0 199670 128819 1946.0 0 2.0 POLYGON ((6044967.574 1971082.045, 6044995.993... False
In [31]:
# Let's create a new column, "bedroom" in the GeoDataFrame. (column name is case-sensitive)
    # The "bedroom" column will have a binary (True/False) value indicating if the parcel has greater than zero Bedrooms.
    # Parcel 1139 has no bedroom.

    
# Now visualize the binary column, "bedroom".

Step 3. Optimization using GurobiPy: Knapsack Problem¶

Knapsack Problem Equations¶

Notations:

The objective is to maximize the total value of items selected while keeping the total weight within the knapsack capacity.

Let:

  • ( $n$ ) be the number of items.
  • ( $v_i$ ) be the value of the ( i )th item.
  • ( $w_i$ ) be the weight of the ( i )th item.
  • ( $W$ ) be the maximum capacity of the knapsack.

where:

  • ( $x_i$ ) is a binary decision variable:
    • ( $x_i$ = 1 ) if the ( i )th item is selected.
    • ( $x_i$ = 0 ) otherwise.

Complete Formulation:

The complete formulation of the Knapsack problem is as follows:

Maximize:

$ \sum_{i=1}^{n} v_i \cdot x_i $

Subject to:

$ \sum_{i=1}^{n} w_i \cdot x_i \leq W $

$ x_i \in \{0, 1\}, \quad \forall i = 1, 2, \ldots, n $

In [32]:
list(range(len(res_parcels)))[-1]
Out[32]:
1735
In [33]:
# Let's import the GurobiPy into this Notebook.
import gurobipy as gp
from gurobipy import GRB

# Define the data (weights, values, and knapsack capacity)
objective_weights = res_parcels['Bedrooms']  # Example weights of items
constraint_weights = res_parcels['LandValue']   # Example values of items
BUDGET = 1000000            # Knapsack capacity
#There is a better way(?) You want to maximize number of bedrooms with a budget of 100000 dollars

# Create a new Gurobi model
m = gp.Model("knapsack")

# Add decision variables for each item (binary variables indicating whether to include the item in the knapsack or not)
x = m.addVars(len(res_parcels), vtype=GRB.BINARY, name="x")

# Set objective function: maximize total value
m.setObjective(sum(x[i]*objective_weights[i] for i in range(len(res_parcels))), sense=GRB.MAXIMIZE)

# Add constraint: total weight of selected items must not exceed knapsack capacity (BUDGET)
# m.addConstr(sum(x[i]*constraint_weights[i] for i in range(len(res_parcels)))<= BUDGET)

# Optimize the model
m.optimize()
Restricted license - for non-production use only - expires 2025-11-24
Gurobi Optimizer version 11.0.1 build v11.0.1rc0 (mac64[arm] - Darwin 22.1.0 22A400)

CPU model: Apple M1
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 0 rows, 1736 columns and 0 nonzeros
Model fingerprint: 0x39170ef2
Variable types: 0 continuous, 1736 integer (1736 binary)
Coefficient statistics:
  Matrix range     [0e+00, 0e+00]
  Objective range  [1e+00, 2e+01]
  Bounds range     [1e+00, 1e+00]
  RHS range        [0e+00, 0e+00]
Found heuristic solution: objective 5271.0000000

Explored 0 nodes (0 simplex iterations) in 0.00 seconds (0.00 work units)
Thread count was 1 (of 8 available processors)

Solution count 1: 5271 

Optimal solution found (tolerance 1.00e-04)
Best objective 5.271000000000e+03, best bound 5.271000000000e+03, gap 0.0000%
In [34]:
#NEW NAPSACK PROBLEM QUESTION #10
# Let's import the GurobiPy into this Notebook.
import gurobipy as gp
from gurobipy import GRB

# Define the data (weights, values, and knapsack capacity)
objective_weights = res_parcels['Bathrooms']  # Example weights of items
constraint_weights = res_parcels['LandValue']   # Example values of items
BUDGET = 1000000            # Knapsack capacity
#There is a better way(?) You want to maximize number of bedrooms with a budget of 100000 dollars

# Create a new Gurobi model
m = gp.Model("knapsack")

# Add decision variables for each item (binary variables indicating whether to include the item in the knapsack or not)
x = m.addVars(len(res_parcels), vtype=GRB.BINARY, name="x")

# Set objective function: maximize total value
m.setObjective(sum(x[i]*objective_weights[i] for i in range(len(res_parcels))), sense=GRB.MAXIMIZE)

# Add constraint: total weight of selected items must not exceed knapsack capacity (BUDGET)
# m.addConstr(sum(x[i]*constraint_weights[i] for i in range(len(res_parcels)))<= BUDGET)

# Optimize the model
m.optimize()
Gurobi Optimizer version 11.0.1 build v11.0.1rc0 (mac64[arm] - Darwin 22.1.0 22A400)

CPU model: Apple M1
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 0 rows, 1736 columns and 0 nonzeros
Model fingerprint: 0xbec9e735
Variable types: 0 continuous, 1736 integer (1736 binary)
Coefficient statistics:
  Matrix range     [0e+00, 0e+00]
  Objective range  [8e-01, 2e+01]
  Bounds range     [1e+00, 1e+00]
  RHS range        [0e+00, 0e+00]
Found heuristic solution: objective 3441.5300000

Explored 0 nodes (0 simplex iterations) in 0.00 seconds (0.00 work units)
Thread count was 1 (of 8 available processors)

Solution count 1: 3441.53 

Optimal solution found (tolerance 1.00e-04)
Best objective 3.441530000000e+03, best bound 3.441530000000e+03, gap 0.0000%
In [ ]:
 
In [35]:
! pip install gurobipy
Requirement already satisfied: gurobipy in /opt/anaconda3/lib/python3.11/site-packages (11.0.1)
In [36]:
# Now, add the model result to the GeoDataFrame
res_parcels["selected"] = [x[i].x > 0.5 for i in range(len(res_parcels))]

# Then Visualize the result
res_parcels.plot("selected", figsize=(15,5), legend=True, cmap="rainbow", edgecolor="black")
plt.axis("off")
plt.title("Selected Residential Parcels")
Out[36]:
Text(0.5, 1.0, 'Selected Residential Parcels')
No description has been provided for this image
In [37]:
# Using folium library, create an interactive plot of showing the optimization result.
import folium

res_parcels = res_parcels.to_crs(4326)
# Create a Folium map centered around the mean latitude and longitude of the GeoDataFrame
map_center = [res_parcels.unary_union.centroid.y, res_parcels.unary_union.centroid.x]
m = folium.Map(location=map_center, zoom_start=15, control_scale=True)


# Define a style function to set colors based on the binary column
def style_function(feature):
    color = 'purple' if feature['properties']['selected'] == 1 else 'pink'
    return {'fillColor': color, 'color': color, 'weight': 1, 'fillOpacity': 0.6}

tooltip = folium.GeoJsonTooltip(
    fields=["APN", "selected", "LandValue", "Bedrooms"]
)


# Add GeoDataFrame to the map with the defined style and tooltip functions
folium.GeoJson(
    res_parcels,
    style_function=style_function,
    tooltip=tooltip
).add_to(m)

# Display the map
m
Out[37]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Chapter 2. Wildfires and Major Roads¶

Step 1. Download the Road Network Data in Santa Barbara¶

Note: OSMnx libarary provides the filtering functionality when downloading a graph.
However, this time we will not use it to practice data processing in Geopandas.

Hint: Santa Barbara Bounding Coordinates
- outhernmost Latitude: 34.2714° N
- Northernmost Latitude: 35.5295° N
- Westernmost Longitude: -120.6976° W
- Easternmost Longitude: -118.9600° W

In [38]:
north, south, east, west = 35.5295, 34.2714, -118.9600, -120.6976

# Fetch the road network
# It takes some time.. We have lots of roads in a county
graph = ox.graph_from_bbox(north, south, east, west, 
                           network_type='drive')
#                           custom_filter='["highway"~"primary|trunk|motorway"]')

# Plot the road network
ox.plot_graph(graph)

# Convert it into GeoDataFrame
/var/folders/0f/r37lnqs969n4hggs14l43w280000gn/T/ipykernel_24414/689972490.py:5: FutureWarning: The `north`, `south`, `east`, and `west` parameters are deprecated and will be removed in the v2.0.0 release. Use the `bbox` parameter instead. See the OSMnx v2 migration guide: https://github.com/gboeing/osmnx/issues/1123
  graph = ox.graph_from_bbox(north, south, east, west,
No description has been provided for this image
Out[38]:
(<Figure size 800x800 with 1 Axes>, <Axes: >)
In [39]:
roads = ox.graph_to_gdfs(graph, nodes=False, edges=True)
roads.plot()
Out[39]:
<Axes: >
No description has been provided for this image
In [40]:
# Check the unique values in "highway" column to select only Major Roads.
roads['highway'].value_counts()
Out[40]:
highway
residential                               90449
tertiary                                  13276
secondary                                  7565
unclassified                               4555
primary                                    3030
motorway_link                               813
trunk                                       626
motorway                                    590
secondary_link                              368
primary_link                                157
[residential, unclassified]                 146
living_street                               136
[tertiary, residential]                     113
tertiary_link                               112
trunk_link                                   92
[trunk, motorway]                            17
[tertiary, unclassified]                     14
[secondary, tertiary]                        11
road                                          8
[residential, living_street]                  8
[secondary, unclassified]                     4
[primary, motorway]                           4
crossing                                      4
escape                                        3
[secondary, secondary_link]                   3
[residential, motorway_link]                  2
disused                                       2
[secondary, residential]                      2
[secondary, residential, unclassified]        2
[tertiary, residential, unclassified]         2
[tertiary, tertiary_link]                     2
[residential, tertiary_link]                  1
[tertiary_link, residential]                  1
[secondary_link, motorway_link]               1
[primary, motorway, motorway_link]            1
[trunk_link, primary_link]                    1
[trunk_link, motorway_link]                   1
[primary, motorway_link]                      1
Name: count, dtype: int64
In [41]:
# Try to plot the road GeoDataFrame with a column "highway"

roads["highway_"] = roads["highway"].astype("str")
roads.plot("highway_", legend=True, figsize=(20,20))
plt.axis("off")
Out[41]:
(-120.78512326500001, -118.872378035, 34.20852533, 35.59235327)
No description has been provided for this image
In [42]:
# Error? No worries. That's intended.
# Why do you think error happens? Hint: read the error message.
In [43]:
# How can we solve this issue? Let's stringify the column, "highway" and create a new column, "highway_"


# Then plot with the new column, "highway_"
In [44]:
# Now, let's create another column, "major", indicating whether each road segment is part of Major Roads or not.
    # Let's consider the "primary" or "trunk" or "motorway" as Major Roads
    
roads["major"] = roads["highway_"].str.contains("primary|trunk|motorway")

# Then plot with the new column, "major"

roads.plot("major",  legend=True, figsize=(20,20), cmap="Spectral_r")
plt.axis("off")
Out[44]:
(-120.78512326500001, -118.872378035, 34.20852533, 35.59235327)
No description has been provided for this image
In [45]:
# Create a subset GeoDataFrame, "major_roads" by selecting the rows where major column value is True
major_roads = roads.loc[roads["major"]==True]
major_roads
Out[45]:
osmid lanes name highway maxspeed oneway reversed length geometry ref bridge access tunnel junction width highway_ major
u v key
95182337 2923989140 0 298912991 NaN NaN motorway_link NaN True False 19.254 LINESTRING (-119.19059 34.27417, -119.19040 34... NaN NaN NaN NaN NaN NaN motorway_link True
95182344 95247838 0 10729877 2 Santa Paula Freeway motorway 65 mph True False 2908.449 LINESTRING (-119.18669 34.27521, -119.18207 34... CA 126 NaN NaN NaN NaN NaN motorway True
95187446 562638742 0 76561348 4 East Thompson Boulevard primary 35 mph False False 139.588 LINESTRING (-119.29155 34.27827, -119.29146 34... US 101 BUS NaN NaN NaN NaN NaN primary True
902353408 0 76561348 4 East Thompson Boulevard primary 35 mph False True 33.903 LINESTRING (-119.29155 34.27827, -119.29164 34... US 101 BUS NaN NaN NaN NaN NaN primary True
902353746 0 186246480 NaN South Chestnut Street primary NaN False False 27.744 LINESTRING (-119.29155 34.27827, -119.29155 34... NaN NaN NaN NaN NaN NaN primary True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11746606869 370197560 0 1264315364 NaN NaN motorway_link NaN False False 22.198 LINESTRING (-120.65807 35.28946, -120.65810 35... NaN NaN NaN NaN NaN NaN motorway_link True
11746606870 0 1264315365 NaN NaN motorway_link NaN False True 57.224 LINESTRING (-120.65807 35.28946, -120.65807 35... NaN NaN NaN NaN NaN NaN motorway_link True
11746606870 1969499984 0 186263631 NaN NaN motorway_link NaN False True 15.034 LINESTRING (-120.65807 35.28998, -120.65807 35... NaN NaN NaN NaN NaN NaN motorway_link True
91429025 0 1031908894 NaN NaN motorway_link NaN True False 84.743 LINESTRING (-120.65807 35.28998, -120.65813 35... NaN NaN NaN NaN NaN NaN motorway_link True
11746606869 0 1264315365 NaN NaN motorway_link NaN False False 57.224 LINESTRING (-120.65807 35.28998, -120.65807 35... NaN NaN NaN NaN NaN NaN motorway_link True

5336 rows × 17 columns

In [46]:
# Check the crs of major_roads
major_roads.crs
Out[46]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In [47]:
# Interactive plot with the major roads. Do you agree that those are major roads?

import folium

# Create a Folium map centered around the mean latitude and longitude of the GeoDataFrame
map_center = [major_roads.unary_union.centroid.y, major_roads.unary_union.centroid.x]
m = folium.Map(location=map_center, zoom_start=7, control_scale=True)


# Define a style function to set colors based on the binary column
def style_function(feature):
    color = 'red' if feature['properties']['selected'] == 1 else 'blue'
    return {'fillColor': color, 'color': color, 'weight': 1, 'fillOpacity': 0.6}

tooltip = folium.GeoJsonTooltip(
    fields=["highway_"]
)


# Add GeoDataFrame to the map with the defined style and tooltip functions
folium.GeoJson(
    major_roads,
    tooltip=tooltip,
    name="Major Roads"
).add_to(m)

folium.LayerControl().add_to(m)
# Display the map
m
Out[47]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Step 2. Relate Ignition locations with Major Roads¶

In [48]:
# Load a shapefile, "Lab1Data/SBC_2007_2021_Wildland_Ignitions.shp" into a variable, "ignitions"

ignitions = gpd.read_file("Lab1Data/SBC_2007_2021_Wildland_Ignitions.shp")
major_roads = major_roads.to_crs(ignitions.crs)
In [49]:
# Plot the major_roads and ignitions together

# Create a new column, "mr_distance" in ignitions GeoDataFrame. 
    # The column contains the distance from each ignition point to the closest major_roads segment.
    # Divide the distance values with 5280 (feet -> miles)
ignitions["mr_distance"] = ignitions.distance(major_roads.unary_union)
ignitions["mr_distance"] = ignitions["mr_distance"]/5280 # feet to miles
A = ignitions.plot("mr_distance", legend=True)
major_roads.plot(ax=A)
Out[49]:
<Axes: >
No description has been provided for this image
In [50]:
# Why it doesn't work? What should you check?


# Yes. Coordinate System. 
In [51]:
# Match the major_roads crs with the ignitions crs.
In [52]:
# Create a new column, "mr_distance" in ignitions GeoDataFrame. 
    # The column contains the distance from each ignition point to the closest major_roads segment.
    # Divide the distance values with 5280 (feet -> miles)
    
    
# Plot the result!
In [53]:
# Now, let's draw a histogram showing the distribution of Distance.

# Create a histogram
plt.figure(figsize=(10, 6))
ignitions.mr_distance.hist(bins=40, color='skyblue', edgecolor='black', alpha=0.7)

# Add titles and labels
plt.title('Histogram of Ignition to Major Road Distance')
plt.xlabel('Distance to Major Road (miles)')
plt.ylabel('Frequency')

# Add grid
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Add a background color
plt.gca().set_facecolor('#f5f5f5')

# Add a legend (optional)
plt.legend(['Distance from Ignition to Major Road'])

# Show plot
plt.show()
No description has been provided for this image
In [54]:
# It's hard to see the result because of an outlier in Channel Island.
# What will be a good way to drop it?

# Drop the furthest ignition point from Major roads, because it's in Channel Islands.

ignitions = ignitions.drop(ignitions.mr_distance.argmax())
In [55]:
# After dropping the outlier, create a histogram once again.
# Create a histogram
plt.figure(figsize=(10, 6))

ignitions.mr_distance.hist(bins=40, color='skyblue', edgecolor='black', alpha=0.7)

# Add titles and labels
plt.title('Histogram of Ignition to Major Road Distance')
plt.xlabel('Ignition to Major Road Distance (miles)')
plt.ylabel('Frequency')

# Add grid
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Add a background color
plt.gca().set_facecolor('#f5f5f5')

# Add a legend (optional)
#plt.legend(['Distance from Ignition to Major Road'])

# Show plot
plt.show()
No description has been provided for this image
In [56]:
# This time, visualize the ignition points with the distance value, and also plot major_roads together.

A = ignitions.plot("mr_distance",legend=True, figsize=(10,10), cmap="RdYlGn_r", legend_kwds={'shrink': 0.3})
major_roads.plot(ax=A, color="darkgrey", lw=0.5)
plt.axis("off")
plt.title("Distance from Ignition to Major Road (miles)")
Out[56]:
Text(0.5, 1.0, 'Distance from Ignition to Major Road (miles)')
No description has been provided for this image

Step 3. Area Burnt and Distance to Major Roads¶

Also, let's check out if the distance to Major Road is related with the area burnt by the fire.
What would be a good way to do so?

In [57]:
# Check out the columns contained in ignitions GeoDataFrame.
# Which column is having the area burnt information?

ignitions.columns
Out[57]:
Index(['Id', 'Alarm_Date', 'Alarm_Time', 'Cause', 'Contain_Da', 'Contain_Ti',
       'Incident_L', 'Incident_N', 'Acres', 'POINT_X', 'POINT_Y', 'DPA',
       'FED_Cause', 'Incident_1', 'geometry', 'mr_distance'],
      dtype='object')
In [58]:
# Draw scatterplot with the mr_distance and area burnt columns.

plt.scatter(ignitions["mr_distance"], ignitions["Acres"])
Out[58]:
<matplotlib.collections.PathCollection at 0x28d894c10>
No description has been provided for this image
In [59]:
# Using a threshold, select a subset of ignitions to exclude outliers.

ignitions_wo_outliers = ignitions.loc[ignitions["mr_distance"] <8]
ignitions_wo_outliers = ignitions.loc[ignitions["Acres"] < 750]
In [60]:
# Draw scatterplot with the mr_distance and area burnt columns.
FIGSIZE = (10,5)
plt.figure(figsize=FIGSIZE)
plt.scatter(ignitions_wo_outliers["mr_distance"], ignitions_wo_outliers["Acres"])
plt.xlabel("Distance to Major Road")
plt.ylabel("Area Burnt in Acres")
plt.title("Scatter plot of Area Burnt in Comparison to Distance to Major Road")

# Add titles and labels.
Out[60]:
Text(0.5, 1.0, 'Scatter plot of Area Burnt in Comparison to Distance to Major Road')
No description has been provided for this image
In [61]:
# Another way is to do IQR

# This time, let's use another way to drop the outliers. Use the IQR based approach.
# You can ask chatgpt for the function.

import numpy as np

def remove_outliers_iqr(data):
    """
    Remove outliers from a dataset using the Interquartile Range (IQR) method.
    
    Parameters:
    - data: A list or NumPy array containing the dataset.
    
    Returns:
    - Cleaned dataset with outliers removed.
    """
    # Convert data to NumPy array for easier manipulation
    data = np.array(data)
    
    # Calculate the first quartile (Q1)
    Q1 = np.percentile(data, 25)
    
    # Calculate the third quartile (Q3)
    Q3 = np.percentile(data, 75)
    
    # Calculate the interquartile range (IQR)
    IQR = Q3 - Q1
    
    # Define the lower bound and upper bound for outlier detection
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    
    # Filter out the outliers
    cleaned_index = (data >= lower_bound) & (data <= upper_bound)
    
    return cleaned_index


ignitions_wo_outliers = ignitions.loc[remove_outliers_iqr(ignitions["Acres"])]
In [62]:
# This time, let's use another way to drop the outliers. Use the IQR based approach.
# You can ask chatgpt for the function.
In [63]:
# Visualzie the scatter plot again. Don't forget to add titles and labels.

plt.figure(figsize=FIGSIZE)
plt.scatter(ignitions_wo_outliers["mr_distance"], ignitions_wo_outliers["Acres"])
plt.xlabel("Distance to Major Road")
plt.ylabel("Area Burnt in Acres")
Out[63]:
Text(0, 0.5, 'Area Burnt in Acres')
No description has been provided for this image
In [64]:
# After dropping the outlier, create a histogram once again.
# Create a histogram
plt.figure(figsize=(10, 6))

ignitions_wo_outliers.mr_distance.hist(bins=40, color='skyblue', edgecolor='black', alpha=0.7)

# Add titles and labels
plt.title('Histogram of Ignition to Major Road Distance')
plt.xlabel('Ignition to Major Road Distance (miles)')
plt.ylabel('Frequency')

# Add grid
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Add a background color
plt.gca().set_facecolor('#f5f5f5')

# Add a legend (optional)
#plt.legend(['Distance from Ignition to Major Road'])

# Show plot
plt.show()
No description has been provided for this image

Final Report ( Total 30 pts )¶

Chapter 1.¶

  1. Submit your final map visualization for the "YearBuilt" column.
    Ensure your map includes necessary elements such as title, legend, and scale bar.
    Do not turn off the axis.
    Utilize a suitable colormap. ( 3 pts )

  2. Load a new GeoDataFrame, "Lab1Data/Parcel_IV.shp".
    Visualize its NetSecValue column with Scalebar.
    You shouldn't turn off the axis for this map. ( 4 pts )

  3. Report the number of SB parcels selected in the given Knapsack Problem, aimed at maximizing the number of bedrooms within a budget of $1,000,000. ( 1 pt )

  4. Report the number of bedrooms that can be obtained in the given Knapsack Problem, with the budget of $1,000,000. ( 1 pt )

  5. Discuss the presence of null values in the "YearBuilt" column.
    Hint: Utilize the Folium library to overlay parcels with null or not-null values on a basemap. ( 2 pts )

  6. In a knapsack problem, what happens if the objective function sets "MINIMIZE" rather than "MAXIMIZE"? ( 2 pt )

  7. How many bathrooms does the parcel with zero bedrooms have? ( 1 pt )

  8. In a knapsack problem, what happens if there's no constraint? Hint: Comment out the setConstraint row ( 2 pts )

  9. Evaluate whether the current knapsack approach is appropriate or not, and provide reasoning for your conclusion. ( 2 pts )

  10. Design a new Knapsack problem scenario, referring to the description.txt file for details on parcel data columns.
    Explain the scenario you have created. ( 2 pts )

  11. Run the Knapsack Problem code with your custom scenario.
    Visualize the results in an interactive plot using Folium.
    Ensure that the color scheme is different from blue/red.
    (Optional) You can try the different parcels in Isla Vista region (parcl_IV.shp) instead of the parcels used for instruction. ( 4 pts )

Chapter 2.¶

  1. Submit a screenshot of your histogram and scatter plot without outliers.
    Your plots must have title, x and y labels. (4 pts)

  2. What insight can you get out of the map, histogram and scatterplot you created in Chapter 2?
    How can you utilize the insights for the urban wildfire risk management? ( 2 pts )

In [65]:
#Question 2: Load a new GeoDataFrame, "Lab1Data/Parcel_IV.shp".
parcels = gpd.read_file("Lab1Data/parcel_IV.shp")
# Plot parcel data
parcels.plot()
Out[65]:
<Axes: >
No description has been provided for this image
In [66]:
#visualize NetSecValue with scalebar
ax = parcels.plot("NetSecVal", legend=True)
#add a scale bar
scalebar = ScaleBar (1, scale_formatter = lambda value, unit: f"{value}000 ft")
ax.add_artist(scalebar)
Out[66]:
<matplotlib_scalebar.scalebar.ScaleBar at 0x2886cc850>
No description has been provided for this image
In [ ]: